EpiLoc: A (Working) Text-Based System for Predicting Protein Subcellular Location
نویسندگان
چکیده
MOTIVATION Predicting the subcellular location of proteins is an active research area, as a protein's location within the cell provides meaningful cues about its function. Several previous experiments in utilizing text for protein subcellular location prediction varied in methods, applicability and performance level. In an earlier work we have used a preliminary text classification system and focused on the integration of text features into a sequence-based classifier to improve location prediction performance. RESULTS Here the focus shifts to the text-based component itself. We introduce EpiLoc, a comprehensive text-based localization system. We provide an in-depth study of text-feature selection, and study several new ways to associate text with proteins, so that text-based location prediction can be performed for practically any protein. We show that EpiLoc's performance is comparable to (and may even exceed) that of state-of-the-art sequence-based systems. EpiLoc is available at: http://epiloc.cs.queensu.ca.
منابع مشابه
Improving subcellular localization prediction using text classification and the gene ontology
MOTIVATION Each protein performs its functions within some specific locations in a cell. This subcellular location is important for understanding protein function and for facilitating its purification. There are now many computational techniques for predicting location based on sequence analysis and database information from homologs. A few recent techniques use text from biological abstracts: ...
متن کاملA System for Predicting Subcellular Localization of Yeast Genome Using Neural Network
The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. Many efforts have been made to predict protein subcellular localization. This paper...
متن کاملSubLoc: a server/client suite for protein subcellular location based on SOAP
Based on SOAP(Simple Object Access Protocol) technology, the SubLoc server/client suite offers a user-friendly interface for searching and predicting protein subcellular location.
متن کاملExtracting information from text and images for location proteomics
There is extensive interest in automating the collection, organization and summarization of biological data. Data in the form of figures and accompanying captions in literature present special challenges for such efforts. Based on our previously developed search engines to find fluorescence microscope images depicting protein subcellular patterns, we introduced text mining and Optical Character...
متن کاملText as data: using text-based features for proteins representation and for computational prediction of their characteristics.
The current era of large-scale biology is characterized by a fast-paced growth in the number of sequenced genomes and, consequently, by a multitude of identified proteins whose function has yet to be determined. Simultaneously, any known or postulated information concerning genes and proteins is part of the ever-growing published scientific literature, which is expanding at a rate of over a mil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2008